3625: Add decompression size limit to prevent decompression bomb DoS#28
3625: Add decompression size limit to prevent decompression bomb DoS#28martin-augment wants to merge 1 commit intomainfrom
Conversation
…b DoS Add maximum decompression size limit in DeflateCodec to prevent OutOfMemoryError when processing maliciously crafted Avro files with high compression ratios (decompression bombs). The limit defaults to 200MB and can be configured via system property: org.apache.avro.limits.decompress.maxLength
WalkthroughThe DeflateCodec.decompress method is modified to enforce a maximum decompressed size limit through a configurable system property ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review: Add decompression size limit to prevent decompression bomb DoSSummaryThis PR adds important security protection against decompression bomb attacks by limiting the maximum decompressed size. The implementation is generally sound but has several issues that should be addressed. 🔴 Critical Issues1. Resource Leak - Inflater not properly managedLocation: The code calls
The original code used Recommendation: Wrap the decompression logic in a try-finally block or use try-with-resources pattern if possible. 2. Integer Overflow RiskLocation: totalBytes += len; // 'len' is int, 'totalBytes' is longWhile Severity: Low (edge case)
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java (1)
42-55: Harden max-length property parsing (negative/zero + SecurityException)
getMaxDecompressLength()currently accepts negative/zero values (making behavior surprising) and can throwSecurityExceptionunder restricted environments. Consider validating> 0and falling back safely.Proposed diff
private static long getMaxDecompressLength() { - String prop = System.getProperty(MAX_DECOMPRESS_LENGTH_PROPERTY); - if (prop != null) { - try { - return Long.parseLong(prop); - } catch (NumberFormatException e) { - // Use default - } - } - return DEFAULT_MAX_DECOMPRESS_LENGTH; + final String prop; + try { + prop = System.getProperty(MAX_DECOMPRESS_LENGTH_PROPERTY); + } catch (SecurityException e) { + return DEFAULT_MAX_DECOMPRESS_LENGTH; + } + if (prop == null) { + return DEFAULT_MAX_DECOMPRESS_LENGTH; + } + try { + long parsed = Long.parseLong(prop); + return parsed > 0 ? parsed : DEFAULT_MAX_DECOMPRESS_LENGTH; + } catch (NumberFormatException e) { + return DEFAULT_MAX_DECOMPRESS_LENGTH; + } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (1)
lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java (1)
23-29: Imports are fine; ensure the chosen exception type is intentional
AddingAvroRuntimeExceptionmeansdecompress()can now fail with an unchecked exception (despitethrows IOException). If callers expect to handle failures viaIOExceptiononly, consider throwingIOExceptionfor limit violations (or wrapping consistently at a higher layer).
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| public ByteBuffer decompress(ByteBuffer data) throws IOException { | ||
| long maxLength = getMaxDecompressLength(); | ||
| NonCopyingByteArrayOutputStream baos = new NonCopyingByteArrayOutputStream(DEFAULT_BUFFER_SIZE); | ||
| try (OutputStream outputStream = new InflaterOutputStream(baos, getInflater())) { | ||
| outputStream.write(data.array(), computeOffset(data), data.remaining()); | ||
| byte[] buffer = new byte[DEFAULT_BUFFER_SIZE]; | ||
| long totalBytes = 0; | ||
|
|
||
| Inflater inflater = getInflater(); | ||
| inflater.setInput(data.array(), computeOffset(data), data.remaining()); | ||
|
|
||
| try { | ||
| while (!inflater.finished()) { | ||
| int len = inflater.inflate(buffer); | ||
| if (len == 0 && inflater.needsInput()) { | ||
| break; | ||
| } | ||
| totalBytes += len; | ||
| if (totalBytes > maxLength) { | ||
| throw new AvroRuntimeException( | ||
| "Decompressed size " + totalBytes + " exceeds maximum allowed size " + maxLength | ||
| + ". This can be configured by setting the system property " + MAX_DECOMPRESS_LENGTH_PROPERTY); | ||
| } | ||
| baos.write(buffer, 0, len); | ||
| } | ||
| } catch (DataFormatException e) { | ||
| throw new IOException("Invalid deflate data", e); | ||
| } | ||
|
|
||
| return baos.asByteBuffer(); | ||
| } |
There was a problem hiding this comment.
Fix “no progress” inflate states to avoid infinite loop / accepting truncated data
In the current loop, len == 0 can occur with needsDictionary() (infinite loop) and needsInput() while not finished() (currently break + returns partial output). These should be treated as invalid/incomplete deflate data and fail fast.
Proposed diff
try {
while (!inflater.finished()) {
int len = inflater.inflate(buffer);
- if (len == 0 && inflater.needsInput()) {
- break;
- }
+ if (len == 0) {
+ if (inflater.needsDictionary()) {
+ throw new IOException("Invalid deflate data (requires a preset dictionary)");
+ }
+ if (inflater.needsInput()) {
+ throw new IOException("Invalid deflate data (truncated input)");
+ }
+ // Defensive: avoid spinning if inflater makes no progress
+ throw new IOException("Invalid deflate data (stalled inflater)");
+ }
totalBytes += len;
if (totalBytes > maxLength) {
throw new AvroRuntimeException(
"Decompressed size " + totalBytes + " exceeds maximum allowed size " + maxLength
+ ". This can be configured by setting the system property " + MAX_DECOMPRESS_LENGTH_PROPERTY);
}
baos.write(buffer, 0, len);
}
} catch (DataFormatException e) {
throw new IOException("Invalid deflate data", e);
}
+
+ if (!inflater.finished()) {
+ throw new IOException("Invalid deflate data (did not reach stream end)");
+ }
return baos.asByteBuffer();
}There was a problem hiding this comment.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! Returning 0 inflated bytes needs more care to avoid infinite loop or unfinished reads. Both would be very problematic for the user applications. An infinite loop would lead to high consumption of a CPU core. Reading less data would lead to broken compressed data.
🤖 Augment PR SummarySummary: This PR hardens Java DEFLATE decompression against decompression-bomb DoS by enforcing a maximum decompressed output size. Changes:
Technical Notes: The new logic replaces 🤖 Was this summary useful? React with 👍 or 👎 |
| try { | ||
| while (!inflater.finished()) { | ||
| int len = inflater.inflate(buffer); | ||
| if (len == 0 && inflater.needsInput()) { |
There was a problem hiding this comment.
inflate() can return 0 when the stream isn’t finished and it doesn’t need more input (e.g., needsDictionary()), which would make this loop spin forever; also, breaking on needsInput() while !finished() risks returning truncated output without error. Consider explicitly handling the len == 0 cases (dictionary/truncated data) to avoid hangs and silent corruption.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! Returning 0 inflated bytes needs more care to avoid infinite loop or unfinished reads. Both would be very problematic for the user applications. An infinite loop would lead to high consumption of a CPU core. Reading less data would lead to broken compressed data.
| } | ||
| totalBytes += len; | ||
| if (totalBytes > maxLength) { | ||
| throw new AvroRuntimeException( |
There was a problem hiding this comment.
This limit breach throws AvroRuntimeException even though decompress() is declared throws IOException; callers that currently catch IOException for decompression failures may miss this and crash. Consider using an IOException subclass (or otherwise documenting this behavior change) so error handling remains consistent.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! The method signature declares that it would throw an IOException and this is what its callers would expect. Throwing a AvroRuntimeException may not be handled by the caller and lead to internal errors in the application.
value:annoying; category:bug; feedback:The Claude AI reviewer is not correct! getInflater() method either creates a new instance of resets the reused instance, so it is not possible to use a "dirty" Inflater. |
value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! Misconfiguration may lead to parsing a negative value for the MAX_DECOMPRESS_LENGTH and this will prevent any usage of the DeflateCodec. Prevents finding an issue too late (at runtime instead of at start time). |
3625: To review by AI